Goto

Collaborating Authors

 learning object-centric representation


Learning Object-Centric Representations of Multi-Object Scenes from Multiple Views

Neural Information Processing Systems

Learning object-centric representations of multi-object scenes is a promising approach towards machine intelligence, facilitating high-level reasoning and control from visual sensory data. However, current approaches for \textit{unsupervised object-centric scene representation} are incapable of aggregating information from multiple observations of a scene. As a result, these ``single-view'' methods form their representations of a 3D scene based only on a single 2D observation (view). Naturally, this leads to several inaccuracies, with these methods falling victim to single-view spatial ambiguities. To address this, we propose \textit{The Multi-View and Multi-Object Network (MulMON)}---a method for learning accurate, object-centric representations of multi-object scenes by leveraging multiple views.


Review for NeurIPS paper: Learning Object-Centric Representations of Multi-Object Scenes from Multiple Views

Neural Information Processing Systems

They also get inspiration from prior work on iterative inference for VAEs and propose an inference mechanism to allow the model efficiently learn an object-centric scene representation from multiple views of a scene which contains multiple objects. During training, the model learns to infer [1,...K] objects (d-dimensional Gaussian latents) in the scene where K upper-bounds the number of objects the model can recognize, and K is set to a high enough value. During training 5 views of a scene are presented and the model is expected to reconstruct both the final rendering and object segmentations for a randomly queried novel viewpoint. They evaluate their their model on GQN-Jaco and two variant so the CLEVR datasets. They compare their model to IODINE and GQN for object segmentation, novel queried viewpoint prediction and disentanglement analysis; the results show that their method performs better quantitatively and qualitatively. They also demonstrate that their model has learned good feature-level disentangled representations.


Learning Object-Centric Representations of Multi-Object Scenes from Multiple Views

Neural Information Processing Systems

Learning object-centric representations of multi-object scenes is a promising approach towards machine intelligence, facilitating high-level reasoning and control from visual sensory data. However, current approaches for \textit{unsupervised object-centric scene representation} are incapable of aggregating information from multiple observations of a scene. As a result, these single-view'' methods form their representations of a 3D scene based only on a single 2D observation (view). Naturally, this leads to several inaccuracies, with these methods falling victim to single-view spatial ambiguities. To address this, we propose \textit{The Multi-View and Multi-Object Network (MulMON)}---a method for learning accurate, object-centric representations of multi-object scenes by leveraging multiple views.